<p>
  Implement a CUDA program that performs sparse matrix-vector multiplication.
  Given a sparse matrix \(A\) of dimensions \(M \times N\) and a dense vector \(x\) of length \(N\),
  compute the product vector \(y = A \times x\), which will have length \(M\). <code>A</code> is stored in row-major order. 
  <code>nnz</code> is the number of non-zero elements in <code>A</code>.
</p>

<p>
  Mathematically, the operation is defined as:
  \[
  y_i = \sum_{j=0}^{N-1} A_{ij} \cdot x_j \quad \text{for} \quad i = 0, 1, \ldots, M-1
  \]
</p>

<p>
  The matrix \(A\) is approximately 60 - 70% sparse.
</p>

<h2>Implementation Requirements</h2>
<ul>
  <li>Use only CUDA native features (external libraries are not permitted)</li>
  <li>The <code>solve</code> function signature must remain unchanged</li>
  <li>The final result must be stored in vector <code>y</code></li>
</ul>

<h2>Example:</h2>
<p>
Input:<br>
Matrix \(A\) (\(3 \times 4\)):
\[
\begin{bmatrix}
5.0 & 0.0 & 0.0 & 1.0 \\
0.0 & 2.0 & 3.0 & 0.0 \\
0.0 & 0.0 & 0.0 & 4.0
\end{bmatrix}
\]
Vector \(x\):
\[
\begin{bmatrix}
1.0 \\
2.0 \\
3.0 \\
4.0
\end{bmatrix}
\]
Output:<br>
Vector \(y\):
\[
\begin{bmatrix}
9.0 \\
13.0 \\
16.0
\end{bmatrix}
\]
</p>

<h2>Constraints</h2>
<ul>
  <li>1 &le; <code>M</code>, <code>N</code> &le; 10,000</li>
  <li>The matrix \(A\) is approximately 60-70% sparse (i.e., 60-70% of elements are zero)</li>
</ul> 